Skip to content

Add Fortio-Envoy optimization guide#29

Open
vaibhavk2 wants to merge 3 commits intointel:mainfrom
vaibhavk2:envoy
Open

Add Fortio-Envoy optimization guide#29
vaibhavk2 wants to merge 3 commits intointel:mainfrom
vaibhavk2:envoy

Conversation

@vaibhavk2
Copy link
Copy Markdown

Add Fortio-Envoy optimization guide and related documentation.

vaibhavk2 added 3 commits May 1, 2026 09:12
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Comment thread software/envoy/README.md

## Overview

Evaluates Envoy running as a TCP proxy in front of Fortio, which acts as the backend load generator. The benchmark focuses on proxy-path performance and behavior under load, measuring metrics such as QPS and latency. Both server-side and client-side components are used to generate traffic and collect results, with Envoy and Fortio running in Docker containers based on the images listed below:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first time reader, topology is unclear. Is it possible to provide a diagram to explain setup?
It is not clear what runs as the server vs client.
Is Fortio being used as both load generator as well as server?
Is client on a different host?

Comment thread software/envoy/README.md

## Overview

Evaluates Envoy running as a TCP proxy in front of Fortio, which acts as the backend load generator. The benchmark focuses on proxy-path performance and behavior under load, measuring metrics such as QPS and latency. Both server-side and client-side components are used to generate traffic and collect results, with Envoy and Fortio running in Docker containers based on the images listed below:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider starting with an overview followed by paragraphs explaining the two components, and then topology/setup used.

Overview

This tuning guide describes best known practises to optimize performance....... when you run Fortio Envoy...

Fortio

Envoy

Topology/setup

Comment thread software/envoy/README.md

## CPU Utilization and CPU Quota

The script applies Docker CPU quotas (`--cpus 16` for Fortio, `--cpus 8` for Envoy). On a high core-count server (eg., 128 cores/256Threads), Docker enforces these quotas via cgroup CPU BW control. The OS spreads threads across all cores but throttles aggregate CPU time, resulting in roughly **6 - 7% per-core utilization** across all server cores - not saturation. The CPU quota is the binding constraint, not the WL.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread software/envoy/README.md
- **Event-driven, non-blocking I/O**: Each worker thread runs an independent libevent loop.
- **`--concurrency N`**: Spawns N worker threads. Each thread owns its own listener socket and connection pool, so there is near-zero cross-thread coordination for established connections.
- **TCP proxy mode** (used here): Envoy accepts a TCP connection on port 9090, opens a connection to Fortio on 8080, and shuttles bytes between them. No L7 parsing overhead.
- **In mesh mode (`SECURE_MESH=true`)**: Adds mTLS - Envoy terminates the downstream TLS connection and re-originates a new TLS connection upstream, roughly doubling the cryptographic work per connection.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier in the document you mentioned SECURE_MESH=true as “no Envoy sidecars, raw application performance” (direct mode), but here you have SECURE_MESH=true Envoy terminates the TLS connection?

Comment thread software/envoy/README.md

2. **Cache coherency traffic**: Spin locks and atomic CAS operations on shared scheduler state cause cache line bouncing across all sockets. On a multi-socket NUMA system, cross-socket coherency traffic adds latency to every lock acquisition and scales with core count.

3. **Kernel paths involved** (from perf flame graphs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to upload a flamegraph for reference?

Comment thread software/envoy/README.md

### 1. NUMA Pinning (Most Impactful)

Pin both Fortio and Envoy to a single NUMA node. This is the single most impactful optimization - it substantially reduces `native_queued_spin_lock_slowpath` overhead by keeping all memory allocations, thread migrations, and NIC interrupts on the same socket.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin both Fortio and Envoy to a single NUMA node. We mean on the server host?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants